Reconfigurable computing

Reconfigurable computing is a computer architecture combining some of the flexibility of software with the high performance of hardware by processing with very flexible high speed computing fabrics like field-programmable gate arrays (FPGAs). The principal difference when compared to using ordinary microprocessors is the ability to make substantial changes to the datapath itself in addition to the control flow. On the other hand, the main difference with custom hardware, i.e. application-specific integrated circuits (ASICs) is the possibility to adapt the hardware during runtime by "loading" a new circuit on the reconfigurable fabric.

Contents

History and properties

The concept of reconfigurable computing has existed since the 1960s, when Gerald Estrin's landmark paper proposed the concept of a computer made of a standard processor and an array of "reconfigurable" hardware.[1][2] The main processor would control the behavior of the reconfigurable hardware. The latter would then be tailored to perform a specific task, such as image processing or pattern matching, as quickly as a dedicated piece of hardware. Once the task was done, the hardware could be adjusted to do some other task. This resulted in a hybrid computer structure combining the flexibility of software with the speed of hardware; unfortunately this idea was far ahead of its time in needed electronic technology.

In the 1980s and 1990s there was a renaissance in this area of research with many proposed reconfigurable architectures developed in industry and academia,[3] such as: COPACOBANA, Matrix, Garp,[4] Elixent, PACT XPP, Silicon Hive, Montium, Pleiades, Morphosys, PiCoGA.[5] Such designs were feasible due to the constant progress of silicon technology that let complex designs be implemented on one chip. The world's first commercial reconfigurable computer, the Algotronix CHS2X4, was completed in 1991. It was not a commercial success, but was promising enough that Xilinx (the inventor of the Field-Programmable Gate Array, FPGA) bought the technology and hired the Algotronix staff.[6]

Reconfigurable computing as a paradigm shift: using the Anti Machine

Table 1: Nick Tredennick’s Paradigm Classification Scheme
Early Historic Computers:
  Programming Source
Resources fixed none
Algorithms fixed none
von Neumann Computer:
  Programming Source
Resources fixed none
Algorithms variable Software (instruction streams)
Reconfigurable Computing Systems:
  Programming Source
Resources variable Configware (configuration)
Algorithms variable Flowware (data streams)

Computer scientist Reiner Hartenstein describes reconfigurable computing in terms of an anti machine that, according to him, represents a fundamental paradigm shift away from the more conventional von Neumann machine. [7] Hartenstein calls it Reconfigurable Computing Paradox, that software-to-configware migration (software-to-FPGA migration) results in reported speed-up factors of up to more than four orders of magnitude, as well as a reduction in electricity consumption by up to almost four orders of magnitude—although the technological parameters of FPGAs are behind the Gordon Moore curve by about four orders of magnitude, and the clock frequency is substantially lower than that of microprocessors. This paradox is due to a paradigm shift, and is also partly explained by the Von Neumann syndrome.

The fundamental model of the reconfigurable computing machine paradigm, the data-stream-based anti machine is well illustrated by the differences to other machine paradigms that were introduced earlier, as shown by Nick Tredennick's following classification scheme of computing paradigms (see "Table 1: Nick Tredennick’s Paradigm Classification Scheme"). [8]

The fundamental model of a Reconfigurable Computing Machine, the data-stream-based anti machine (also called Xputer), is the counterpart of the instruction-stream-based von Neumann machine paradigm. This is illustrated by a simple reconfigurable system (not dynamically reconfigurable), which has no instruction fetch at run time. The reconfiguration (before run time) can be considered as a kind of super instruction fetch. An anti machine does not have a program counter. The anti machine has data counters instead, since it is data-stream-driven. Here the definition of the term data streams is adopted from the systolic array scene, which defines, at which time which data item has to enter or leave which port, here of the reconfigurable system, which may be fine-grained (e. g. using FPGAs) or coarse-grained, or a mixture of both.

The systolic array scene, originally (early 1980s) mainly mathematicians, only defined one half of the anti machine: the data path: the systolic array (also see Super systolic array). But they did not define nor model the data sequencer methodology, considering that this is not their job to take care where the data streams come from or end up. The data sequencing part of the anti machine is modeled as distributed memory, preferably on chip, which consists of auto-sequencing memory (ASM) blocks. Each ASM block has a sequencer including a data counter. An example is the Generic Address Generator (GAG), which is a generalization of the DMA.

Example of a streaming model of computation

Problem: We are given 2 character arrays of length 256: A[] and B[]. We need to compute the array C[] such that C[i]=B[B[B[B[B[B[B[B[A[i]]]]]]]]]. Though this problem is hypothetical, similar problems exist which have some applications.

Consider a software solution (C code) for the above problem:

for(int i=0;i<256;i++)
	{
	char a=A[i];
	for(int j=0;j<8;j++)
		a=B[a];
	C[i]=a;
	}

This program will take about 256*10*CPI cycles for the CPU, where CPI is the number of cycles per instruction.

Now, consider the hardware implementation shown here, say on an FPGA. Here, one element from the array 'A' is 'streamed' by a microprocessor into the circuit every cycle. The array 'B' is implemented as a ROM, perhaps in the BRAMs of the FPGA. The wire going into the ROMs labelled 'B' are the address lines and the wires out are the values stored in the ROM at that address. The blue boxes are registers used for storing temporary values Clearly, this is a pipeline and will output 1 value (a useful C[i] value) after the 8th cycle. Hence the output is also a 'stream'.

The hardware implementation takes 256+8 cycles. Hence, we can expect a speedup of about 10*CPI over the software implementation. However, the speedup is much less than this value due to the slow clock of the FPGA.

See also

References

  1. ^ Estrin, G. 2002. Reconfigurable computer origins: the UCLA fixed-plus-variable (F+V) structure computer. IEEE Ann. Hist. Comput. 24, 4 (Oct. 2002), 3–9. DOI=http://dx.doi.org/10.1109/MAHC.2002.1114865
  2. ^ Estrin, G., "Organization of Computer Systems—The Fixed Plus Variable Structure Computer," Proc. Western Joint Computer Conf., Western Joint Computer Conference, New York, 1960, pp. 33–40.
  3. ^ C. Bobda: Introduction to Reconfigurable Computing: Architectures; Springer, 2007
  4. ^ Hauser, John R. and Wawrzynek, John, "Garp: A MIPS Processor with a Reconfigurable Coprocessor," Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '97, April 16–18, 1997), pp. 24–33.
  5. ^ Campi, F.; Toma, M.; Lodi, A.; Cappelli, A.; Canegallo, R.; Guerrieri, R., "A VLIW processor with reconfigurable instruction set for embedded applications," Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC. 2003 IEEE International , vol., no., pp. 250-491 vol.1, 2003
  6. ^ Algotronix History
  7. ^ Hartenstein, R. 2001. A decade of reconfigurable computing: a visionary retrospective. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE 2001) (Munich, Germany). W. Nebel and A. Jerraya, Eds. Design, Automation, and Test in Europe. IEEE Press, Piscataway, NJ, 642–649.
  8. ^ N. Tredennick: The Case for Reconfigurable Computing; Microprocessor Report, Vol. 10 No. 10, 5 August 1996, pp 25–27.

Further reading

External links